MySQL8でハッシュジョインする際の内部結合・外部結合のフィルタ条件評価のタイミング

内部結合の場合、等結合でない追加の条件はハッシュジョイン完了後に評価されます

#MySQL

岩田智哉

2024.08.28

リテールアプリ共創部@大阪の岩田です。

先日とある環境のSQLをチューニングする機会があったのですが、その際MySQL8でハッシュジョインを利用する場合は内部結合か外部結合かによってフィルタ条件を評価するタイミングが異なることを知りました。せっかくなので内容について紹介させて頂きます。

以後の内容についてはMySQL Community Edition 8.0.29で動作確認しています

内部結合の場合、等結合でない追加の条件はハッシュジョイン完了後に評価される

公式ドキュメントには以下のように記載されています

In cases like the one just shown, which makes use of an inner join, any extra conditions which are not equi-joins are applied as filters after the join is executed. (For outer joins, such as left joins, semijoins, and antijoins, they are printed as part of the join.) This can be seen here in the output of EXPLAIN:

機械翻訳された日本語版は以下です

内部結合を使用する前述のような場合、等価結合ではない追加の条件は、結合の実行後にフィルタとして適用されます。 (左結合、準結合、アンチ結合などの外部結合の場合は、結合の一部として出力されます。) これは、EXPLAIN の出力に表示されます

どういうことかというと以下のように記述した場合はハッシュジョイン完了後に tab1.col1 <> tab2.col2の条件を満たすレコードにフィルタされます

...略 FROM tab1 INNER JOIN tab2 ON tab1.col1 <> tab2.col2

一方で以下のように記述して外部結合した場合はtab1.col1 <> tab2.col2の条件を満たすレコードだけをハッシュジョインする動作になります。

...略 FROM tab1 LEFT JOIN tab2 ON tab1.col1 <> tab2.col2

EXPLAIN ANALYZEで確認してみる

MySQLのサンプル用データベースとして公開されているsakila databaseを使って簡単に確認してみます。まず内部結合の場合です。以下のSQLを実行してみます

EXPLAIN ANALYZE 
SELECT
    *
FROM
    city c1
INNER JOIN 
    city c2
    ON
        c1.country_id <> c2.country_id

結果は以下の通りでした。

-> Filter: (c1.country_id <> c2.country_id)  (cost=36061.86 rows=324000) (actual time=0.984..39.562 rows=346592 loops=1)
    -> Inner hash join (no condition)  (cost=36061.86 rows=324000) (actual time=0.973..18.805 rows=360000 loops=1)
        -> Table scan on c2  (cost=0.09 rows=600) (actual time=0.065..0.214 rows=600 loops=1)
        -> Hash
            -> Table scan on c1  (cost=60.75 rows=600) (actual time=0.462..0.723 rows=600 loops=1)

Inner hash joinの完了後にFilter: (c1.country_id <> c2.country_id)が評価されていることが分かります。

続いて外部結合の場合を確認してみます。INNER JOINをLEFT JOINに書き換えた以下のSQLを実行してみます。

EXPLAIN ANALYZE 
SELECT
    *
FROM
    city c1
LEFT JOIN 
    city c2
    ON
        c1.country_id <> c2.country_id

結果は以下の通りでした。

-> Left hash join (no condition), extra conditions: (c1.country_id <> c2.country_id)  (cost=36001.21 rows=360000) (actual time=0.525..24.602 rows=346592 loops=1)
    -> Table scan on c1  (cost=60.75 rows=600) (actual time=0.040..0.202 rows=600 loops=1)
    -> Hash
        -> Table scan on c2  (cost=0.10 rows=600) (actual time=0.099..0.379 rows=600 loops=1)

今度はLeft hash joinのextra conditionsとして c1.country_id <> c2.country_idを評価していることが分かります。

論理的には等価なSQLでも内部結合を外部結合に変えると高速化するケースがある

実際にチューニングを行った際の経緯を紹介します。チューニング対象となったのは以下のようなSQLでした。

SELECT
    *
FROM
  (サブクエリ1)
  CROSS JOIN
  (サブクエリ2)
  CROSS JOIN
  (サブクエリ3)
ORDER BY
   ...略
LIMIT 12000

なんとCROSS JOINが使われています...しかも2回も。仮にそれぞれのサブクエリが100件の結果を返すとするとJOIN後のレコード数は100万件に達します。

EXPLAIN ANALYZEの結果を確認したところ以下のような結果になりました

-> Limit: 12000 row(s)  (actual time=8180.235..8183.757 rows=12000 loops=1)	
    -> Sort: ソートキー DESC, limit input to 12000 row(s) per chunk  (actual time=8180.230..8183.297 rows=12000 loops=1)	
        -> Stream results  (cost=27454414593876212.00 rows=274543879828149920) (actual time=36.036..4918.028 rows=4098528 loops=1)	
            -> Inner hash join (no condition)  (cost=27454414593876212.00 rows=274543879828149920) (actual time=36.030..1036.956 rows=4098528 loops=1)	
                -> Table scan on サブクエリ1  (cost=0.01..12898.85 rows=1031708) (actual time=0.001..0.045 rows=252 loops=1)	
	…略
                -> Hash	
                    -> Inner hash join (no condition)  (cost=26610872576.63 rows=266106078335) (actual time=9.853..12.136 rows=16264 loops=1)	
                        -> Table scan on サブクエリ2  (cost=0.01..6450.68 rows=515855) (actual time=0.001..0.014 rows=76 loops=1)	
	…略
                        -> Hash	
                            -> Table scan on サブクエリ3  (cost=0.01..6450.68 rows=515855) (actual time=0.001..0.012 rows=214 loops=1)	
	…略

本来はCROSS JOINを廃止すべきなのですが、そうもいかない事情があったので他の方法で改善できないか調査しました。すると各サブクエリを実行した結果のcol1という列が2つ以上3になるレコードは抽出対象外にできることが分かりました。試しにSQLを以下のように修正しました。

SELECT
    *
FROM
  (サブクエリ1)
  CROSS JOIN
  (サブクエリ2)
  ON
      NOT(サブクエリ1.col1 = 3 AND サブクエリ2.col1 = 3)
  CROSS JOIN
  (サブクエリ3)
  ON
      NOT(サブクエリ1.col1 = 3 AND サブクエリ3.col1 = 3)
      -- 本当はサブクエリ1とサブクエリ2,サブクエリ2とサブクエリ3の組み合わせについてもチェックが必要だが簡略化のため省略
ORDER BY
   ...略
LIMIT 12000

この状態で再度EXPLAIN ANALYZEの結果を確認すると以下のようになりました

-> Limit: 12000 row(s)  (actual time=7755.775..7758.966 rows=12000 loops=1)			
    -> Sort: ソートキー DESC, limit input to 12000 row(s) per chunk  (actual time=7755.771..7758.496 rows=12000 loops=1)			
        -> Stream results  (cost=27454414593876212.00 rows=274543879828149920) (actual time=55.500..4967.274 rows=3782471 loops=1)			
            -> Filter: (((サブクエリ1.col1 <> 3) or (サブクエリ2.col2 <> 3)) and ((サブクエリ1.col1 <> 3) or (サブクエリ3.col1 <> 3)))  (cost=27454414593876212.00 rows=274543879828149920) (actual time=55.494..1458.769 rows=3782471 loops=1)			
                -> Inner hash join (no condition)  (cost=27454414593876212.00 rows=274543879828149920) (actual time=55.491..1116.366 rows=4098528 loops=1)			
                    -> Table scan on サブクエリ1  (cost=0.01..12898.85 rows=1031708) (actual time=0.001..0.037 rows=252 loops=1)			

	…略		
                    -> Hash			
                        -> Inner hash join (no condition)  (cost=26610872576.63 rows=266106078335) (actual time=19.598..23.176 rows=16264 loops=1)			
                            -> Table scan on サブクエリ2  (cost=0.01..6450.68 rows=515855) (actual time=0.001..0.020 rows=76 loops=1)			
	…略		
                            -> Hash			
                                -> Table scan on サブクエリ3  (cost=0.01..6450.68 rows=515855) (actual time=0.001..0.028 rows=214 loops=1)			
	…略

サブクエリ1,2,3全てのJOINが完了してからFilter: (((サブクエリ1.col1 <> 3) or (サブクエリ2.col2 <> 3)) and ((サブクエリ1.col1 <> 3) or (サブクエリ3.col1 <> 3)))が適用されており、デカルト積が生成される量を抑えられていません。ソート対象のレコード数が減らせたので多少高速化はしていますが、もう少し早い段階でフィルタを適用したいところです。

この段階で意図通りのアクセスパスにならなかったので、色々と調べた結果前述の公式ドキュメントの記載に行き着きました。そこでCROSS JOINをLEFT JOINに書き換え、以下のようなクエリを試してみました。

SELECT
    *
FROM
  (サブクエリ1)
  LEFT JOIN
  (サブクエリ2)
  ON
      NOT(サブクエリ1.col1 = 3 AND サブクエリ2.col1 = 3)
  LEFT JOIN
  (サブクエリ3)
  ON
      NOT(サブクエリ1.col1 = 3 AND サブクエリ3.col1 = 3)
      -- 本当はサブクエリ1とサブクエリ2,サブクエリ2とサブクエリ3の組み合わせについてもチェックが必要だが簡略化のため省略
ORDER BY
   ...略
LIMIT 12000

EXPLAIN ANALYZEの結果は以下の通りになりました。

-> Limit: 12000 row(s)  (actual time=7022.024..7025.314 rows=12000 loops=1)		
    -> Sort: ソートキー DESC, limit input to 12000 row(s) per chunk  (actual time=7022.023..7024.841 rows=12000 loops=1)		
        -> Stream results  (cost=27454387983291444.00 rows=274543879828149920) (actual time=17.988..4168.555 rows=3782471 loops=1)		
            -> Left hash join (no condition), extra conditions: ((サブクエリ1.col1 <> 3) or (サブクエリ3.col2 <> 3))  (cost=27454387983291444.00 rows=274543879828149920) (actual time=17.983..639.583 rows=3782471 loops=1)		
                -> Left hash join (no condition), extra conditions: ((サブクエリ1.col1<> 3) or (サブクエリ2.col1<> 3))  (cost=53221428898.54 rows=532211640814) (actual time=13.513..24.784 rows=50179 loops=1)		
                    -> Table scan on サブクエリ1  (cost=0.01..12898.85 rows=1031708) (actual time=0.001..0.153 rows=252 loops=1)		
	…略	
                    -> Hash		
                        -> Table scan on サブクエリ2  (cost=0.01..6450.68 rows=515855) (actual time=0.001..0.012 rows=214 loops=1)		
                -> Hash		
                    -> Table scan on サブクエリ3  (cost=0.01..6450.68 rows=515855) (actual time=0.001..0.005 rows=76 loops=1)

サブクエリ1とサブクエリ2をジョインする段階でフィルタが適用されているので、デカルト積が生成される量を30万件程度抑えられています。今回のケースではCROSS JOIN をLEFT JOINに変更しても論理的には等価ですが、アクセスパスが効率化されてパフォーマンスをある程度改善することができました。

この後他のチューニング案も合わせて実施することで、問題となっていたSQLもトータルではかなり改善することができました。